AITopics

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.60)
Information Technology > Artificial Intelligence > Machine Learning (0.37)

Neural Information Processing SystemsOct-9-2025, 20:47:22 GMT

Suppress Content Shift: Better Diffusion Features via Off-the-Shelf Generation Techniques

Despite the simplicity, the proposed approach has achieved superior results on various tasks and datasets, validating its potential as a generic booster for diffusion features.

content shift, diffusion feature, diffusion model, (15 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry: Information Technology (0.67)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(3 more...)

Neural Information Processing SystemsFeb-6-2025, 08:38:26 GMT

Export Reviews, Discussions, Author Feedback and Meta-Reviews

We are very grateful for reviewing our submission and for the comments. First, we would like to clarify the position of this work in the body of machine learning research. Prior to this work, the literature on attention-based RNNs missed an analysis of applicability of the approach to long input sequences, and most importantly to sequences longer those seen during training. Our work fills this important gap. It presents an analysis of the drawbacks of the existing approaches and proposes an effective and generic solution.

author feedback and meta-review, convolutional feature, sequence, (12 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.39)

Neural Information Processing SystemsJan-18-2025, 23:55:58 GMT

Encoding Spatial Distribution of Convolutional Features for Texture Representation

convolutional feature, encoding spatial distribution, texture representation, (2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.63)
Information Technology > Artificial Intelligence > Machine Learning (0.42)

Liu, Tianlin, Münchmeyer, Jannes, Laurenti, Laura, Marone, Chris, de Hoop, Maarten V., Dokmanić, Ivan

SeisLM: a Foundation Model for Seismic Waveforms

arXiv.org Artificial IntelligenceOct-21-2024

We introduce the Seismic Language Model (SeisLM), a foundational model designed to analyze seismic waveforms -- signals generated by Earth's vibrations such as the ones originating from earthquakes. SeisLM is pretrained on a large collection of open-source seismic datasets using a self-supervised contrastive loss, akin to BERT in language modeling. This approach allows the model to learn general seismic waveform patterns from unlabeled data without being tied to specific downstream tasks. When fine-tuned, SeisLM excels in seismological tasks like event detection, phase-picking, onset time regression, and foreshock-aftershock classification. The code has been made publicly available on https://github.com/liutianlin0121/seisLM.

large language model, machine learning, natural language, (21 more...)

2410.15765

Country:

North America > United States (1.00)
Europe (1.00)
North America > Canada > Ontario (0.28)

Genre: Research Report > New Finding (0.46)

Industry: Energy > Oil & Gas > Upstream (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)
(2 more...)

arXiv.org Artificial IntelligenceOct-18-2024

Suppress Content Shift: Better Diffusion Features via Off-the-Shelf Generation Techniques

Meng, Benyuan, Xu, Qianqian, Wang, Zitai, Yang, Zhiyong, Cao, Xiaochun, Huang, Qingming

Diffusion models are powerful generative models, and this capability can also be applied to discrimination. The inner activations of a pre-trained diffusion model can serve as features for discriminative tasks, namely, diffusion feature. We discover that diffusion feature has been hindered by a hidden yet universal phenomenon that we call content shift. To be specific, there are content differences between features and the input image, such as the exact shape of a certain object. We locate the cause of content shift as one inherent characteristic of diffusion models, which suggests the broad existence of this phenomenon in diffusion feature. Further empirical study also indicates that its negative impact is not negligible even when content shift is not visually perceivable. Hence, we propose to suppress content shift to enhance the overall quality of diffusion features. Specifically, content shift is related to the information drift during the process of recovering an image from the noisy input, pointing out the possibility of turning off-the-shelf generation techniques into tools for content shift suppression. We further propose a practical guideline named GATE to efficiently evaluate the potential benefit of a technique and provide an implementation of our methodology. Despite the simplicity, the proposed approach has achieved superior results on various tasks and datasets, validating its potential as a generic booster for diffusion features. Our code is available at https://github.com/Darkbblue/diffusion-content-shift.

artificial intelligence, machine learning, natural language, (19 more...)

2410.06719

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry: Information Technology (0.67)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Neural Information Processing SystemsOct-11-2024, 09:39:50 GMT

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations. Advances like SPPnet and Fast R-CNN have reduced the running time of these detection networks, exposing region proposal computation as a bottleneck. In this work, we introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals. An RPN is a fully-convolutional network that simultaneously predicts object bounds and objectness scores at each position. RPNs are trained end-to-end to generate high-quality region proposals, which are used by Fast R-CNN for detection.

r-cnn, real-time object detection, region proposal network, (5 more...)

Technology: Information Technology > Artificial Intelligence > Vision (0.70)

arXiv.org Artificial IntelligenceMay-27-2023

Masked World Models for Visual Control

Seo, Younggyo, Hafner, Danijar, Liu, Hao, Liu, Fangchen, James, Stephen, Lee, Kimin, Abbeel, Pieter

Visual model-based reinforcement learning (RL) has the potential to enable sample-efficient robot learning from visual observations. Yet the current approaches typically train a single model end-to-end for learning both visual representations and dynamics, making it difficult to accurately model the interaction between robots and small objects. In this work, we introduce a visual model-based RL framework that decouples visual representation learning and dynamics learning. Specifically, we train an autoencoder with convolutional layers and vision transformers (ViT) to reconstruct pixels given masked convolutional features, and learn a latent dynamics model that operates on the representations from the autoencoder. Moreover, to encode task-relevant information, we introduce an auxiliary reward prediction objective for the autoencoder. We continually update both autoencoder and dynamics model using online samples collected from environment interaction. We demonstrate that our decoupling approach achieves state-of-the-art performance on a variety of visual robotic tasks from Meta-world and RLBench, e.g., we achieve 81.7% success rate on 50 visual robotic manipulation tasks from Meta-world, while the baseline achieves 67.9%. Code is available on the project website: https://sites.google.com/view/mwm-rl.

artificial intelligence, machine learning, representation, (14 more...)

2206.14244

Country:

North America > United States (0.14)
North America > Canada > Ontario > Toronto (0.14)
Oceania > New Zealand > North Island > Auckland Region > Auckland (0.04)
(3 more...)

Genre: Research Report > New Finding (0.46)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Pitsikalis, Manolis, Do, Thanh-Toan, Lisitsa, Alexei, Luo, Shan

Logic Rules Meet Deep Learning: A Novel Approach for Ship Type Classification

arXiv.org Artificial IntelligenceNov-1-2021

The shipping industry is an important component of the global trade and economy, however in order to ensure law compliance and safety it needs to be monitored. In this paper, we present a novel Ship Type classification model that combines vessel transmitted data from the Automatic Identification System, with vessel imagery. The main components of our approach are the Faster R-CNN Deep Neural Network and a Neuro-Fuzzy system with IF-THEN rules. We evaluate our model using real world data and showcase the advantages of this combination while also compare it with other methods. Results show that our model can increase prediction scores by up to 15.4\% when compared with the next best model we considered, while also maintaining a level of explainability as opposed to common black box approaches.

information, logic rule, neuro-fuzzy model, (16 more...)

2111.01042

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > Monterey County > Monterey (0.04)
(2 more...)

Genre: Research Report > New Finding (0.48)

Industry: Transportation > Marine (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Rahaman, Rahul, Ghosh, Atin, Thiery, Alexandre H.

Pretrained equivariant features improve unsupervised landmark discovery

arXiv.org Artificial IntelligenceApr-7-2021

Locating semantically meaningful landmark points is a crucial component of a large number of computer vision pipelines. Because of the small number of available datasets with ground truth landmark annotations, it is important to design robust unsupervised and semi-supervised methods for landmark detection. Many of the recent unsupervised learning methods rely on the equivariance properties of landmarks to synthetic image deformations. Our work focuses on such widely used methods and sheds light on its core problem, its inability to produce equivariant intermediate convolutional features. This finding leads us to formulate a two-step unsupervised approach that overcomes this challenge by first learning powerful pixel-based features and then use the pre-trained features to learn a landmark detector by the traditional equivariance method. Our method produces state-of-the-art results in several challenging landmark detection datasets such as the BBC Pose dataset and the Cat-Head dataset. It performs comparably on a range of other benchmarks.

convolutional feature, dataset, landmark, (15 more...)

2104.02925

Country:

Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)
Asia > Singapore (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)